Effect of Climate on Photovoltaic Yield Prediction Using Machine Learning Models

Abstract Machine learning is arising as a major solution for the photovoltaic (PV) power prediction. Despite the abundant literature, the effect of climate on yield predictions using machine learning is unknown. This work aims to find climatic trends by predicting the power of 48 PV systems around the world, equally divided into four climates. An extensive data gathering process is performed and open‐data sources are prioritized. A website www.tudelft.nl/open-source-pv-power-databases has been created with all found open data sources for future research. Five machine learning algorithms and a baseline one have been trained for each PV system. Results show that the performance ranking of the algorithms is independent of climate. Systems in dry climates depict on average the lowest Normalized Root Mean Squared Error (NRMSE) of 47.6 %, while those in tropical present the highest of 60.2 %. In mild and continental climates the NRMSE is 51.6 % and 54.5 %, respectively. When using a model trained in one climate to predict the power of a system located in another climate, on average systems located in cold climates show a lower generalization error, with an additional NRMSE as low as 5.6 % depending on the climate of the test set. Robustness evaluations were also conducted that increase the validity of the results.

• The Global Energy Forecasting Competition (GEFCom) [4] conducted by Dr. Tao Hong considered the PV power forecasting problem in the 2014 edition. The employed data set is publicly available for researchers.
• Marion et al. made their data set of three American PV systems openly available. In their publication [5] one can find how to access the data.
• DuraMAT DataHub, also by NREL, is a collaborative framework where the public can provide and access PV data for durability studies [6]. Deep exploration of the website has not been carried out, but a data set of 9 bifacial tracking systems was found for download.
• The London Datastore open data-sharing portal [7] includes the Photovoltaic (PV) Solar Panel Energy Generation data set which contains voltage, current, power, energy and weather data from domestic sites with Solar Panels located in the city of London.
• Kaggle is an online machine learning community, subsidiary of Google, which allows users to find and publish data sets and to build models in a web-based data-science environment [8]. One can find several data sets related to PV power, which have been reported in www.tudelft.nl/open-sourcepv-power-databases. For instance, Horizontal Photovoltaic Power Output Data data set was employed in an Energies publication [9]. It includes PV power and weather data for 12 Northern hemisphere sites over 14 months. • Although generally not recommended since it is meant for software development and version control, GitHub is also employed to store data sets. For instance, the three individual systems in India with over 6 months of data and one minute resolution [11].
• The Hathaway Solar Patriot House is a project which monitors and models the performance of a sustainable house outside of Washington, D.C with a 6kWp photovoltaic system [12]. The collected data has been made available and includes a large number of measurements such as solar irradiance, DC electrical measurements from the PV array and battery, energy from and to the grid, energy consumption from electrical appliances, several temperatures and other measurements.
• The data provided by researchers Dr. Victor Vega from the University of Costa Rica (UCR) and Prof. Janez Krč from the University of Ljubljana can be directly downloaded from the developed website www.tudelft.nl/open-source-pv-power-databases.
• In order to access the two Finnish systems, Anders Lindfors from the Finnish Meteorological Institute (FMI) should be contacted.
For researchers interested in cumulative production instead of individual systems, there are also several choices: • Belgian's electricity system operator, Elia, has had the initiative to provide open access to all of its public grid data [13]. Power generation data including generated PV power can be found for the whole country or by areas.
• Similarly, the German electricity market information platform SMARD provides actual generation over Germany, Austria and Luxembourg with 15 minutes resolution [14].
• Outside of Europe, the Electric Power Statistics Information System (in Korean) provides monthly data of power generation sorted by fuel type [15].
• Paul-Frederik Bach had the initiative of collecting data from several European system operators.
One can find hourly time series of cumulative production in his blog [16].
• Also for Europe, Open Power System Data platform offers open data required by energy system models, including time series of solar power generation with up to 15 minutes resolution [17].
Other interesting sources, although not fully open-source, are PV CAMPER and pvoutput.org. PV CAMPER (Photovoltaic Collaborative to Advance Multi-climate Performance and Energy Research) [18] is a collaborative platform of PV institutions that share data within the community. There are entry requirements to access the community, but the platform was founded with the objective of data sharing and collaboration. pvoutput.org is a free service for sharing and comparing PV output data. Through the API one can retrieve generation data from any system after having shared data from an owned PV system and made a donation of minimum 15 AUD per year.